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Abstract — In Recent days the Finite Impulse Response FIR 
filter occupies the most important role in the digital system. In 
this paper explains the number of reduced multiplier increases 
with the length of the FIR filter structure designed by using 
the FIR algorithm, the number of reduced multiplier increases 
with the length of FIR filter structure it significant to the 
hardware for symmetric convolution from existing FFA 
parallel to the FIR filters and FIR filter designed by using the 
FIR algorithm. In this FIR filter structure using poly phase 
decomposition technique and it can requires minimum 
number of multipliers and it consumes for low power. 
Normally the multiplier consumes more power larger than the 
adder, In compared to multiplier it requires the minimum 
hardware cost and less power compared to the existing 
parallel FIR filter structure. 

Index Terms — Finite Impulse Response (FIR), FIR 
Algorithm, Adder, Multiplier, Symmetric Convolution. 


i. Introduction 

The finite-impulse response (FIR) filter has one of the 
fundamental processing elements in any digital signal 
processing (DSP) system. FIR filters are used in DSP 
applications that range from video and image processing to 
wireless communications. In some applications, such as 
video processing, the FIR filter circuit must be able to operate 
at high frequencies, the finite-impulse response (FIR) filter 
has been and continues to be one of the fundamental 
processing elements in any digital signal processing (DSP) 
system. FIR filters are used in DSP applications that range 
from video and image processing to wireless communications. 
In this multimedia application used for high- performance 
and low-power digital signal processing (DSP) is getting 
higher and higher. The FIR digital filter is one of the most 
widely used fundamental devices performed in DSP systems, 
Some applications need the FIR filter to operate at high 
frequencies such as video processing. In some other 
applications request high through put with a low-power 
circuit such as multiple-input-multiple-output systems used 
in cellular wireless communication. Furthermore, when 
narrow transition band characteristics are required, the much 
higher order in the FIR filter is unavoidable. Due to its linear 
increase in the hardware implementation cost is increase of 
the block size, the parallel processing technique loses and its 
advantages in practical implementation. 
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n. LITERATURE SURVEY 

In this parallel FIR filter structure based on the poly phase 
decompositions and it can reduces the amount of 
multiplication in the sub filter and compared to the fast 
parallel FIR algorithm. The fast linear symmetric convolution 
is used to develop the small size filter structure. In 
symmetricconvolutionstructureisusedtoobtainanewhardware 
efficientfastparallelfinite-impulse response (FIR) filter 
structure, it saves a large amount of hardware cost, and the 
length of the FIR filter is large. In many signals processing 
application the fast digital filtering are required. 

III. PREVIOUS RESEARCH 

In few papers proposing way to reduce the complexity of 
the parallel FIR digital filter structure. In this research studied 
that FIR digital filters can be used for high speed and low 
power application. The fast fir digital filter structure 
quantization can be reduced in the number of binary adder up 
to 25%. In this parallel FIR filter structure based on poly phase 
decomposition and reduce the amount of multiplications in 
the sub-filter section by exploiting compared to the existing 
FFA fast parallel FIR filter structure. 

In this research article by Chao Cheng and Keshab K. Parhi, 
when it comes to symmetric convolutions, the symmetry of 
coefficients has not been taken into consideration for the 
design of structures yet, which can lead to significant saving 
in hardware cost. This paper presents an FIR algorithm based 
on the mixed Radix algorithm and Fast convolution algorithm. 
This ISC-based linear convolution structure is transposed to 
obtain a new hardware efficient fast parallel finite impulse 
response filter. 

In a research article by Yan Sun and Min Sik Kim they 
present an approach to implement a high-performance 8 -tap 
digital FIR (Finite Impulse Response) filter using the 
Logarithmic Number System. In the past, FIR filters were 
implemented by a conventional number system; their speed 
was limited because of the multiply- accumulate operations. 
We realize a fast FIR filter by utilizing the Logarithmic 
Number System, which allows a simple implementation of 
multiplication using a fixed-point adder. And the serious 
demerit of Logarithmic Number System’s algorithm, 
conversions to and from the conventional number 
representations, is effectively overcome by pipelining to 
reduce the delay and complexity of the filter. The critical path 
was reduced from a multiply-accumulate operation to an add 
operation. Our FIR filter requires 27% less area than the 
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original FIR filter. In this project to provide new parallel FIR 
filter structures based on FFA consisting of advantageous 
poly phase decompositions, which can reduce amounts of 
multiplications in the sub-filter section by exploiting of the 
symmetric coefficients, compared to the existing FFA fast 
parallel FIR filter structure. 

IV. FAST FIR ALGORITHM 

The traditional L-parallel FIR filter can be derived using 
poly-phase decomposition and it used parallel processing 
technique it increases throughput or decreases the power 
consumption 

Assuming {xi } and {hi} to be the input sequence and the 
Nth-order impulse response of an FIR filter respectively, the 
output sequence yn and the filter transfer function H(z) can be 

Written as (1), 

vtn) = LfLV hi Xq , n = 0. 1 , 2, „,oo 

HCz) = £J? =0 h(rO z“ n 

The traditional L-parallel FIR filter can be derived using 
poly-phase decomposition as 

Jfco T 5 (z L ) z' i= 2 3 to l Hj(2 L )2-i £{;:£ \(z l ) 2’ k 

Where Yi(z), Xk(z), and Hj(z) are the poly-phase 
components of output, input, and the filter transfer function, 
respectively and the poly-phase components are defined as 
follows, 





This block FIR filtering equation shows that the parallel 
FIR filter can be realized using L2 - FIR filters of length 
N/2This linear complexity can be reduced using various FFA 
structures. Implementation of (5) is shown in Fig. 2.This 
structurehas three FIR sub-filter blocks of lengthN/2, which 
requires 3N/2 multipliers and 3(N/2-l) + 4 adders. From the 
figure, thisfilter 

Structure has one preprocessing and Three post processing 
adders 



Fig liTrational Parallel FIR Filter 

The hardware implementation of requires six length N/3 
FIR sub-filter blocks, three preprocessing and seven 
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post-processing adders, which reduce hardware cost. 



As can be seen from example above, two of three sub filter 
blocks from the proposed two parallel FIR filter 
structure, H0-H1 and HO+Hl,are with of the symmetric 
coefficient now, which means the sub filter block can be 
realized by fig: 4, with only half of the amount of multipliers 
required. Each output of multipliers responds to two taps. No 
that the transposed direct form FIR filter is employed. 
Compare to the existing FFA two parallel FIR filter structure, 
the proposed FFA structure leads to one more sub filter block 
which contain symmetric coefficient. However it’s come with 
the price of the increase amount of address in preprocessing 
and post processing blocks. In this case two additional adders 
are required. 

A. Parallel processing FIR filters for High speed or Low 

Power 

It is well-known that the application of parallel processing 
to a FIR filter can increase the through-put of the FIR filter. If 
a Parallel filter is operated at the same clock rate as the 
original filter, L output samples are generated every clock 
cycle compared to the single output sample that is produced 
every clock cycle in the original filter. This implies that the 
L-parallel filter effectively operates at L times the rate of the 
original FIR filter, it is clear that parallel processing can 
increase the throughput of a FIR filter, the technique of 
parallel processing can also be used to reduce the power 
consumption of a FIR filter. This factisoften overlooked. The 
application of parallel processing facilitates the lowering of 
the supply voltage which in turn leads to a decrease in the 
power consumption [IS, 151. Let Po=C,V2f„ represent the 
power consumed in the original FIR filter, where CO is the 
capacitance of the original filter, V, is the supply voltage of 
the original filter and fO is the clock frequency of the original 
filter. It should be noted that fO=l/To, where To is the clock 
period of the original filter. In order to maintain the same 
sample rate, the clock period of L parallel filter must be 
increased to LT, since L samples are produced every clock 
cycle. This means that CO is charged in time LT, rather than 
in time TO . In other words, there is more time to charge the 
same capacitance (see Figure 1). This implies that the supply 
voltage can be lowered to PV„ where p is a positive constant 
less than 1. By examining the propagation delay 
considerations of the original and parallel filter, the power 
supply reduction factor, P, can be determined. The 
propagation delay of the original circuit is given by 
Tpd = gOvO/K(vO-vt) 

Where k is a process dependent parameter and V, is the 
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device threshold voltage. It should be noted that the 
clockperiod, To,s typically set equal to the maximum 
propagation delay, Tpd ,in a circuit. The propagation delay of 
the G parallel filter is given by 

p = P2 (~CO) V, 2 (fO/L) =P2COV2fO 


which can reduce amounts of multiplications in the sub filter 
section by exploiting the inherent nature of the symmetric 
coefficients, compared to the existing FFA fast parallel FIR 
filter structure To utilize the symmetry of coefficients, the 
main idea behind the proposed structures is actually pretty 
intuitive, to manipulate the polyphone decomposition to earn 
as many sub filter blocks as possible . 


V. FIR FILTER STRUCTURE BASED ON THE SYMMETRIC 
CONVOLUTION 

In this approach to increase the throughput of FIR filters 
with reduced complexity hardware and starts with the short 
convolution algorithms, which are transposed to obtain 
computationally efficient parallel filter structures. Parallel 
FIR filters are implemented by using poly phase 
decomposition and fast FIR algorithms (FFA). The FFA 
sareiteratedtogetfastparallelFIRAlgorithmsforlargerblocksize 
s.Althoughandthesmall-sizedparallel filter structures are 
computationally efficient, the number of required delay 
elements increases with the increase of the level of 
parallelism. However, the transpose of the linear convolution 
structure in is an optimal parallel FIR filter structure in terms 
of the required delay elements. While the To eplitz-matrix 
factorization procedure in buriesdditional delay elements 
inside the diagonal sub filter matrix and the algorithmins 
places additional delay elements in the post addition matrix, 
parallel FIR filtering structure based on the transpose of the 
linear convolution structure requires no additional delays 
inside the convolution matrix. Furthermore, the positions of 
the delay elements in this transposed linear convolution 
structure are nicely placed and thus this structure is more 
regular. Asetoffast block filtering algorithms are derived 
based on fast short-length linear convolution algorithms to 
realize the parallel processing of sub filters, 

However, when the convolution length increases, the 
number of additions increases dramatically, which leads to 
complex pre addition and post addition matrices that are not 
practical for hardware implementation. Therefore, if we 
could use fast convolution algorithms to decompose the 
convolution matrix with simple pre addition and post 
addition matrices, we can get computationally efficient 
parallel FIR filter with reduced number of required delay 
elements. Fortunately, we can use the mixed radix algorithm. 
Which decomposes the convolution matrix with tensor 
product into two short convolutions this algorithm is 
combined with fast two and three point convolution 
algorithms to obtain age near literated short convolution 
algorithm (ISC A) Although fast convolution of any length 
can be derived 

fromCook-T omalgorithmorW inogradalgorithmtheirperditio 
norpostadditionmatricesmaycontainelementsnotinthe set , 
which makes them not suitable for hardware implementation 
of iterated convolution algorithm. However, in both 
categories of method, when it comes to symmetric 
convolutions, the symmetry of coefficients has not been taken 
into consideration for the design of structures yet, which 
canlead to a significant saving in hardware cost. In this paper, 
we provide new parallel FIR filter structures based on FFA 
consisting of advantageous polyphone decompositions, 
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A. Area Reduction Technique 

In an effort to reduce the hardware costs below what is at 
ievi thro’ueh 3 application of the fast algorithms, several area 
reduction techniques are used. The first area reduction 
technique that we use involves implementing the parallel 
filters in a multiplier less fashion. It is widely known that 
multiplication by a constant multiple can be realized using 
only shifts and additions 111, 9, 20, lo] . For example, Y 
multiplied by X= 0.1110 can be implemented as 
Y»l+Y»2+Y»3 , where »denotes a shift to the right. By 
using a dedicated shift and add implementation rather tha na 
general purpose multiplier for the constant multiple, the 
hardware cost is significantly reduced. A general purpose 
multiplier assumes that all of the bits could be active during a 
multiplication operation. In most cases, however, the 
constant multiplier does not have all bits active which implies 
that some of the hardware in the general multiplier is not 
necessary. Since the binary representation of the filter 
coefficients is known prior to implementation, we know 
exactly which bits of the coefficient will be active during a 
multiplication operation.T herefore, we can implement the 
filter coefficients using exactly the required amount of 
hardware(shifts and additions) for that particular filter 
coefficient. Since of the filter coefficients are implemented 
using shifts and additions alone, the entire parallel filter 
structure can be implemented using only shifts, additions and 
delays (registers). In addition to making the implementation 
significantly smaller, replacing general purpose multipliers 
with dedicated shift and add multipliers allows the 
implemented parallel filtering circuit to operate at higher 
clock rates. 


VI. SIMULATION AND RESULT 



Fig 3: Output Wave form of FIR Filter 
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Fig 4: The schematic diagram of FIR filter 
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VII. CONCLUSION 

In this FIR filter digital structure in order to reduce the 
hardware complexity and power consumption and its 
beneficial to symmetric convolution. In this FIR digital filter 
has the poly-phase decompositions its dealing with 
symmetric convolutions and its better than the existing FFA 
structure in terms hardware consumption. It’s profitable to 
exchange the multiplier with adders. When the number of 
increasing the adders and the length of the FIR filter becomes 
large and to reduce the multiplier when the length of the FIR 
filter becomes large. In this paper we have to provide new 
parallel FIR structure consists the advantages of poly-phase 
decompositions dealing with symmetric convolution and it’s 
also provide the better performance and its FIR filter 
consumes less power and it becomes has more efficient area 
compare to existing FFA structure. 
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